• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) ºÐ»êº´·Ä Ŭ·¯½ºÅÍ ÄÄÇ»ÆÃÀ» ÀÌ¿ëÇÑ GVCFGenome Variant Call Format ÆÄÀÏÀÇ Á¤·Ä ¹× º´ÇÕ ¹æ¹ý
¿µ¹®Á¦¸ñ(English Title) A Sort and Merge Method for Genome Variant Call Format GVCF Files using Parallel and Distributed Computing
ÀúÀÚ(Author) ÀÌÁø¿ì   ¿øÁ¤ÀÓ   À±ÁöÈñ   JinWoo Lee   Jung-Im Won   JeeHee Yoon  
¿ø¹®¼ö·Ïó(Citation) VOL 48 NO. 03 PP. 0358 ~ 0367 (2021. 03)
Çѱ۳»¿ë
(Korean Abstract)
Â÷¼¼´ë ½ÃÄö½Ì(next-generation sequencing, NGS) ±â¹ýÀÇ ¹ß´Þ·Î ÀÎÇÏ¿© ¹æ´ëÇÑ À¯Àüü µ¥ÀÌÅÍÀÇ ºÐ»ê, º´·Ä󸮰¡ ÇʼöÀûÀÎ ¹æ¹ý·ÐÀ¸·Î ´ëµÎµÇ°í ÀÖ´Ù. NGS À¯Àüü µ¥ÀÌÅÍ Ã³¸®´Â µ¥ÀÌÅÍ ±Ô¸ð·Î ÀÎÇÏ¿© ÀϹÝÀûÀ¸·Î ¸Å¿ì ±ä ½ÇÇà ½Ã°£À» ÇÊ¿ä·Î ÇÑ´Ù. º» ³í¹®¿¡¼­´Â GVCF ÆÄÀÏ Á¤·Ä/º´ÇÕ ½ÇÇà ½Ã°£À» ´ÜÃàÇϱâ À§ÇÏ¿© ºÐ»êº´·Ä Ŭ·¯½ºÅÍ ÄÄÇ»ÆÃÀ» ÀÌ¿ëÇÑ »õ·Î¿î GVCF ÆÄÀÏ Á¤·Ä/º´ÇÕ ¸ðµâÀ» Á¦¾ÈÇÑ´Ù. Á¦¾ÈÇÏ´Â ¸ðµâ¿¡¼­´Â ºÐ»êº´·Ä Ŭ·¯½ºÅÍÀÎ Spark¸¦ »ç¿ëÇϸç, Ŭ·¯½ºÅÍ ³»ÀÇ ÀÚ¿øÀ» È¿À²ÀûÀ¸·Î »ç¿ëÇϱâ À§ÇØ GVCF ÆÄÀÏÀÇ Æ¯¼ºÀ» °í·ÁÇÑ µÎ ´Ü°èÀÇ °úÁ¤À¸·Î Á¤·Ä/º´ÇÕÀ» ÁøÇàÇÑ´Ù. ¼º´É Æò°¡¸¦ À§ÇÏ¿© GATKÀÇ Combine-GVCFs ¸ðµâ°ú Á¦¾ÈÇÏ´Â ¸ðµâÀÇ GVCF ÆÄÀÏÀÇ °³¼ö¿¡ µû¸¥ Á¤·Ä/º´ÇÕ ½ÇÇà½Ã°£À» ÃøÁ¤ÇÏ¿© ºñ±³ ¹× Æò°¡¸¦ ÁøÇàÇÏ¿´´Ù. ½ÇÇè °á°ú¿¡ ÀÇÇÏ¿© Á¦¾ÈÇÏ´Â ¹æ½ÄÀÌ ½ÇÇà½Ã°£À» ¸Å¿ì È¿À²ÀûÀ¸·Î ´ÜÃà½ÃÅ°°í ÀÖÀ½À» È®ÀÎ ÇÏ¿´À¸¸ç, Á¦¾ÈÇÏ´Â ¹æ½ÄÀÇ À¯¿ë¼ºÀ» ÀÔÁõÇÏ¿´´Ù.
¿µ¹®³»¿ë
(English Abstract)
With the development of next-generation sequencing (NGS) techniques, a large volume of genomic data is being produced and accumulated, and parallel and distributed computing has become an essential tool. Generally, NGS data processing entails two main steps: obtaining read alignment results in BAM format and extracting variant information in genome variant call format (GVCF) or variant call format (VCF). However, each step requires a long execution time due to the size of the data. In this study, we propose a new GVCF file sorting/merging module using distributed parallel clusters to shorten the execution time. In the proposed algorithm, Spark is used as a distributed parallel cluster. The sorting/merge process is performed in two steps according to the structural characteristics of the GVCF file in order to use the resources in the cluster efficiently. The performance was evaluated by comparing our method with the GATK's CombineGVCFs module based on sorting and merging execution time of multiple GVCF files. The outcomes suggest the effectiveness of the proposed method in reducing execution time. The method can be used as a scalable and powerful distributed computing tool to solve the GVCF file sorting/merge problem.
Å°¿öµå(Keyword) Â÷¼¼´ë ½ÃÄö½Ì   º¯ÀÌ ºÐ¼®   Genome Variant Call Format(GVCF) ÆÄÀÏ ¼ÒÆ®/¸ÓÁö   ½ºÆÄÅ©   ºÐ »êº´·Ä󸮠  next-generation sequencing (NGS)   variant analysis   Genome Variant Call Format(GVCF) File Sort/Merge   Spark   parallel/distributed computing  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå